Median-KNN Regressor-SMOTE-Tomek Links for Handling Missing and Imbalanced Data in Air Quality Prediction
نویسندگان
چکیده
The Air Quality Index (AQI) dataset contains information on measurements of pollutants and ambient air quality conditions at certain location that can be used to predict quality. Unfortunately, this often has many missing observations imbalanced classes. Both these problems affect the performance prediction model. In particular, predictions for minority class are very important because inaccurate fatal or cause big losses. Moreover, data may lead biased results. This paper proposes single imputation median multiple imputations k-Nearest Neighbor (KNN) regressor handle values less than equal 10% more 10%, respectively. At same time, SMOTE-Tomek Links address class. These proposed approaches both issues then assess India AQI using Naive Bayes (NB), KNN, C4.5. five treatments show method Median-KNN regressor-SMOTE-Tomek is able improve other words, succeeds in overcoming imbalance.
منابع مشابه
Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem
The class imbalanced problem occurs in various disciplines when one of target classes has a tiny number of instances comparing to other classes. A typical classifier normally ignores or neglects to detect a minority class due to the small number of class instances. SMOTE is one of over-sampling techniques that remedies this situation. It generates minority instances within the overlapping regio...
متن کاملConversion of Imbalanced Data Into A Stream Using SMOTE Algorithm
Machine learning approach has got major importance when distribution of data is unknown. Classification of data from the data set causes some problem when distribution of data is unknown. Characterization of raw data relates to whether the data can take on only discrete values or whether the data is continuous. In real world application data drawn from non-stationary distribution, causes the pr...
متن کاملClass Confidence Weighted kNN Algorithms for Imbalanced Data Sets
In this paper, a novel k -nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performa...
متن کاملOversampling for Imbalanced Learning Based on K-Means and SMOTE
Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification a...
متن کاملAddressing data complexity for imbalanced data sets: analysis of SMOTE-based oversampling and evolutionary undersampling
In the classification framework there are problems in which the number of examples per class is not equitably distributed, formerly known as imbalanced data sets. This situation is a handicap when trying to identify the minority classes, as the learning algorithms are not usually adapted to such characteristics. An usual approach to deal with the problem of imbalanced data sets is the use of a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Symmetry
سال: 2023
ISSN: ['0865-4824', '2226-1877']
DOI: https://doi.org/10.3390/sym15040887